Finite-state Multimodal Parsing and Understanding
نویسندگان
چکیده
Multimodal interfaces require effective parsing and understanding of utterances whose content is distributed across multiple input modes. Johnston 1998 presents an approach in which strategies for multimodal integration are stated declaratively using a unification-based grammar that is used by a multidimensional chart parser to compose inputs. This approach is highly expressive and supports a broad class of interfaces, but offers only limited potential for mutual compensation among the input modes, is subject to significant concerns in terms of computational complexity, and complicates selection among alternative multimodal interpretations of the input. In this paper, we present an alternative approach in which multimodal parsing and understanding are achieved using a weighted finite-state device which takes speech and gesture streams as inputs and outputs their joint interpretation. This approach is significantly more efficient, enables tight-coupling of multimodal understanding with speech recognition, and provides a general probabilistic framework for multimodal ambiguity resolution.
منابع مشابه
Finite-state Methods for Multimodal Parsing and Integration
Finite-state machines have been extensively applied to many aspects of language processing including, speech recognition (Pereira and Riley, 1997; Riccardi et al., 1996), phonology (Kaplan and Kay, 1994; Kartunnen, 1991), morphology (Koskenniemi, 1984), chunking (Abney, 1991; Joshi and Hopely, 1997; Bangalore, 1997), parsing (Roche, 1999), and machine translation (Bangalore and Riccardi, 2000)....
متن کاملMultimodal Language Processing For
Interfaces for mobile information access need to allow users flexibility in their choice of modes and interaction style in accordance with their preferences, the task at hand, and their physical and social environment. This paper describes the approach to multimodal language processing in MATCH (Multimodal Access To City Help), a mobile multimodal speech-pen interface to restaurant and subway i...
متن کاملMultimodal Dialogue System Grammars∗
We describe how multimodal grammars for dialogue systems can be written using the Grammatical Framework (GF) formalism. A proof-of-concept dialogue system constructed using these techniques is also presented. The software engineering problem of keeping grammars for different languages, modalities and systems (such as speech recognizers and parsers) in sync is reduced by the formal relationship ...
متن کاملIntegration of supra-lexical linguistic models with speech recognition using shallow parsing and finite state transducers
This paper proposes a layered Finite State Transducer (FST) framework integrating hierarchical supra-lexical linguistic knowledge into speech recognition based on shallow parsing. The shallow parsing grammar is derived directly from the full fledged grammar for natural language understanding, and augmented with top-level n-gram probabilities and phrase-level context-dependent probabilities, whi...
متن کاملTowards multimodal interaction with an intelligent room
There is a great potential for combining speech and gestures to improve human computer interaction because this kind of communication resembles more and more the natural communication humans use every day with each other. Therefore, this paper is about the multimodal interaction consisting of speech and gestures in an intelligent room. The advantages of using multimodal systems are explained an...
متن کامل